Channel: PyData
Category: Science & Technology
Tags: pythonlearn to codeeducationsoftwarepydatalearncodinghow to programjuliaopensourcescientific programmingnumfocuspython 3tutorial
Description: Bodo: Supercomputing-Like Performance and Scale for Python/Pandas Speaker: Ehsan Totoni Summary Bodo is a new compute engine using a novel JIT inferential compiler technology that brings supercomputing-like performance and scalability to native Python analytics code. Bodo automatically parallelizes Python/Pandas code allowing applications to scale to 10,000+ cores and petabytes of data, and is orders of magnitude faster than alternatives such as Spark and Dask. Description Python is often praised for simplicity, but criticized for low performance and scalability. Bodo is a new compute engine that brings supercomputing-like performance and scalability to native Python analytics code. Bodo automatically parallelizes Python/Pandas code allowing applications to scale to 10,000+ cores and petabytes of data without any rewrites into Scala, C++ or non-native APIs, making Python the best solution for challenging data engineering tasks like ETL, data prep, and featurization. This is made possible using a new just-in-time (JIT) inferential compiler technology that can automatically performs the optimizations that usually require efforts from world-class performance experts. We will discuss how this technology works, present examples and benchmarks and explain why it is orders of magnitude faster than alternatives such as Spark and Dask. Ehsan Totoni's Bio Ehsan is an entrepreneur, computer science researcher, and software engineer working on democratization of High Performance Computing (HPC) for data analytics/AI/ML. Ehsan received his PhD in computer science from the University of Illinois at Urbana-Champaign, working on various aspects of HPC and Parallel Computing. He then worked as a research scientist at Intel Labs and Carnegie Mellon University, focusing on programming systems to address the gap between programmer productivity and computing performance. Ehsan drives Bodo’s mission to democratize HPC for data science, with a belief that it should not remain the bastion of a select few experts. And, as Moore’s Law begins to slow, it is Ehsan’s vision to apply parallel computing principles to maintain computing’s evolution. Twitter: twitter.com/EhsanTn LinkedIn: linkedin.com/in/ehsan-totoni-44928286 PyData Global 2021 Website: pydata.org/global2021 LinkedIn: linkedin.com/company/pydata-global Twitter: twitter.com/PyData pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: github.com/numfocus/YouTubeVideoTimestamps